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PARE 


Answer the following multiple choice questions (Questions 1-15) on the given 
MULTIPLE CHOICE ANSWER SHEET. Use the appropriate 2B pencil to mark 
the correct answer. 

Question 1 [1 Mark] 
We randomly select some digits between 0 and 9 with replacement. Which of the following 


is not correct: 
1. Each integer between 0 and 9 has probability 0.10 of being selected. 
2. Each selection is independent of the other. 


3. If you have selected a very large number of random digits, then each integer between 


0 and 9 would occur close to 10% of the time. 


4. The cumulative proportion of times that a 0 is selected tends to get close to 0.10 as 


the number of selected random digits gets larger and larger. 
5. If you select 10 random digits, each integer between 0 and 9 must occur exactly once. 


Question 2 [1 Mark] 


Which one of the following statements is correct about a symmetric dataset? 
1. The mean and median will usually be different. 
2. The mean and median will usually be the same. 
3. The mean will usually be higher than the median. 
4. The mean will usually be lower than the median. 


5. The mean could be the same, higher or lower than the median. 
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Question 3 [1 Mark] 
Which one of the following choices describes a problem for which a linear regression model 


would be appropriate? 
1. Analyzing the relationship between weight and height. 


2. Comparing the proportion of successes for three different treatments of anxiety. Each 


treatment is tried on 100 patients. 


3. Analyzing the relationship between gender and opinion about capital punishment 


(favor or oppose). 
4. Comparing the mean birth weights of newborn babies from three different countries. 
5. Comparing the variances of newborn babies’ weights from three different countries. 


Question 4 [1 Mark] 


If A and B are mutually exclusive events, which of the following statements is true: 
1. P(A or B) =0 
2. P(A and B) = P(A) x P(B) 
3. P(A or B) = P(A) x P(B) 
4. P(A) +P(B)=1 
5. P(A or B) = P(A) + P(B) 


Question 5 [1 Mark] 


Which graphical and numerical summary is not appropriate for a categorical variable? 
1. Piechart. 
2. Bar graph of frequencies. 
3. Contingency table. 
4. Bar graph of proportions. 


5. Histogram. 
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Question 6 [1 Mark] 


Which of the following statements is true for a 95% Confidence Interval for the mean? 
1. The confidence interval includes the 95% of the data. 


2. If we repeat the experiment 100 times, 95 of the populations’ means will be included 


in the confidence interval. 


3. If we repeat the experiment 100 times, 95 of the confidence intervals will include the 


population mean. 
4. With 95% probability the confidence interval will include the population mean. 
5. With 95% probability the population mean will fall in the confidence interval. 


Question 7 [1 Mark] 
Which of these 95% confidence intervals for the difference between means does represent 


a significant difference at the 0.05 level? 
(130615 
Pod) 
3. (—0.5, .5) 
4. (0.1, 1) 


520,01) 
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Question 8 [1 Mark] 
A researcher wants to test if there are significant differences between an experimental and 
a well-established blood pressure drug. He wants to conduct a 90-day clinical trial with 
20 high-blood pressure patients. A non-random assignment of a drug to a patient seems 


appropriate because: 


1. If the researcher chooses to administer a drug based on the sex of the patient, he 
might be able to find a link between the performance of the drug and the sex of the 


patient. 


2. If he lets the patients choose which prescription to follow, the number of patients 


that will stick to their given prescription will increase. 
3. The sample will be representative of the population of high blood pressure patients. 
4. All patients need to take the new drug in order to generate more data. 
5. None of the above statements are correct. 


Question 9 [1 Mark] 
We want to compare the average hours of sleep for 20 STAT100 T2 (mean 6.3h, s.d. 1h) 
versus 20 STAT100 T3 students (mean 6.5h s.d., 0.9h). Which procedure is the most 


appropriate? 
1. Paired t-test. 
2. Unpooled t-test. 
3. Z-score test. 
4. Pooled t-test. 


5. ANOVA. 
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Question 10 [1 Mark] 
The value of the correlation coefficient between two variables (A, B) is r = —0.8. Which 


of the statements is true: 
1. A negative value is not valid. 
2. The value indicates a strong positive association. 
3. The variable B explains 80% of the variation of the A variable. 
4. The variable B explains 64% of the variation of the A variable. 
5. The variable B explains 20% of the variation of the A variable. 


Question 11 [1 Mark] 
A cycling club wants to place an order for new bicycles and tries to assess two new bicycle 
models Spott and Titan. The Spott model is cheaper than the Titan and the club wants 
to see if the extra cost presents significant advantages. Each of the 22 athletes of the club 
does one lap with the Spott model and one lap with the Titan model in a random order. 
The club records the differences of the lap times and uses an appropriate statistical test 
to see if they are significantly different from zero. Select the degrees of freedom that they 


are appropriate in this case: 


STAT100 Special Trimester 3, 2015 


Question 12 [1 Mark] 
Using the plots of Figure 1, identify which one depicts a regression model with line equation 


y=o= 2: 


Figure 1: Regression models 


Question 13 [1 Mark] 
The distribution of 2-bedroom apartment prices in the centre of a big city is assumed to 
follow a Normal distribution with $1,000,000 mean and 100,000 standard deviation. What 
proportion of the flats do you expect to be listed outside the $900,000—$1,100,000 price 


range? 
1. 95% 
2. 68% 
3. 50% 
4, 32% 


5. 5% 
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Question 14 [1 Mark] 


A market research company wants to test if a soft drink A is more popular than a soft 


drink B. In a 100-responses questionnaire 55% answered that prefer drink A and 45% that 


they prefer drink B. Choose the combination of null and alternative hypotheses that best 


addresses this test: 


4. 


5. 


. Ho: pa =pp,,: pa > pp 
. Ho: pa = ps, A, : pa F Pp 


. Ho: pA = be, 1: ba > LB 


Ho: pa # pp, H,:pa=pp 


Ho: Pa = Ps, : pa > DB 


Question 15 [1 Mark] 
Which of the following statements is a necessary condition for a Chi-Square Goodness of 
Fit test: 


. all expected counts should be between 1 and 5. 
. all expected counts should be greater than 5. 
. all expected counts should be greater than 1. 
. all expected counts should be greater than 0. 


. at least 80% of the counts should be greater than 1. 
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PART II 


Answer the remaining questions on the given answer booklet 


Question 16 [6 Marks] 
The height of a dog species is considered to follow a Normal distribution with mean p = 
68.5cm and standard deviation 1.5cm. Consider two random variables X,Y which express 


the corresponding heights of two unrelated dogs from the same species. 
(a) State the mean, the standard deviation and the distribution of D= X —Y. [2 
(b) Using the Figure 2 calculate the probability of the height of a dog to exceed 70cm. (2 


(c) Assume that a random sample of 25 dogs was taken from the population, and the sample 


mean A is calculated. State the sampling distribution of the sample mean. [2 


CDF of the Standard Normal Distribution 


Cumulative Probability 
0 O14 02 03 04 05 06 0.7 08 09 1 


Figure 2: Standard Normal Cumulative Distribution 
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Question 17 [7 Marks] 
Table 1 gives the counts of voting intentions from a 1991 US survey. These counts are 
categorized according to political party (Democrat, Independent, Republican) and the Sex 
of the subject (Male, Female). 


Democrat Independent Republican 
F 762 B20 468 
M 484 239 ATT 


Table 1: Voting intentions from a 1991 survey. 


(a) Name a graphical summary that could help you to compare the distribution of the 


voting intentions of Males v.s. Females. [1] 


(b) We want to investigate if the voting intentions depend on the sex of the voter. We continue 


the analysis using a Chi-squared (x7) test. 


(i) Explain why constructing confidence intervals for proportions is not appropriate in 
this setting. Propose a modified research question which could use the confidence 


intervals for proportions. 2 


(ii) Formulate the Null and the Alternative hypothesis. Use mathematical notation for 


full marks. 2 


(iii) Given the following output state your conclusion. 2 


Pearson’s Chi-squared test 


X-squared = 30.0701, df = 2, p-value = 2.954e-07 


10 
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Question 18 [9 Marks] 
The box bellow contains the output for a linear regression model relating the academic 


perfomance (GPA) of a student with the hours spent per week watching television. 


Residuals: 
Min 1Q Median 3Q Max 
-2.38953 -0.40010 0.06751 0.37037 1.09702 


Coefficients: 

Estimate Std. Error t value Pr(>|t|) 
(Intercept) 2.940408 0.060703 48.439 <2e-16 
TV -0.002879 0.004468 -0.644 0.52 
Residual standard error: 0.5964 on 162 degrees of freedom 
Multiple R-squared: 0.002556,Adjusted R-squared: -0.003601 
F-statistic: 0.4151 on 1 and 162 DF, p-value: 0.5203 


(a) State the equation of the model above. (1] 
(b) Interpret the intercept and slope coefficients. [2] 
(c) Interpret the diagnostic plots of Figure 3 and comment on the model fit. [3] 
Residuals vs Fitted Normal Q-Q 
ae ne = 
3 Bou 
2 4 g 
ey, 
i) 
Nn _| 199 oO _| 
I | o19 
350 
Y 7 035 
T T T T T T I I I T T 
265 270 2.75 280 2.85 290 2.95 2 1+ 0 1 2 
Fitted values Theoretical Quantiles 


Figure 3: Diagnostic Plots 


(d) Construct a 95% confidence interval for the slope and comment on the relation between tv 


and academic performance. You can use the formula: [3] 


slope + z* x (standard error of slope), where z* = 1.96. 
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Question 19 [8 Marks] 


Many car manufacturers recommended a tire pressure of 32 psi (pounds per square 
inch). At a roadside vehicle safety checkpoint, officials plan to randomly select 50 cars 
for which this is the recommended tire pressure and measure the actual tire pressure in 
the front left tire. They want to know whether drivers on average have too little pressure 
in their tires. Suppose that the experiment is conducted, and the mean and standard 


deviation for the 50 cars tested are 30.1 psi and 3 psi, respectively. 


1 


Cumulative Probability 
01 02 03 04 05 06 0.7 08 09 


0 


Figure 4: CDF of the sampling distribution. 


(a) Propose an appropriate test and justify it. [1 
(b) State the null and alternative hypothesis. (2 
(c) State the appropriate degrees of freedom. [1 
(d) Calculate the appropriate statistic. (2 
(e) Using Figure 4 write an informative conclusion. [2 
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[5 Marks] 


We have three types of wine (A, B and C), and we would like to know which one is the 


most popular. We asked 22 friends to taste each of the three wines, blinded, and then to 


give a grade of 1 (worst taste) to 7 (best taste). Each person rates each wine 5 times and 


the mean of the scores is reported. Table 2 contains the numerical summaries of the scores 


and Figure 5 the boxplots. 


mean sd IQR 0% 25% 50% 75% 100% 
Wine A 5.54 0.27 0.24 5.05 5.41 5.50 5.65 6.30 
Wine B- 5.53 (0.26 0.14 5.00 5.46 5.53 5.60 6.30 
WineC 5.46 0.27 0.20 4.95 5.35 545 5.55 6.25 


Table 2: Summaries of wine scores. 


Wine A 


Wine B 


Wine C 


Figure 5: Boxplots of wine scores. 


(a) Using the numerical summaries above comment on the wines’ scores. [1] 


(b) Interpret the box plots and relate your interpretation to your comments in (a). [1] 


13 


Question 20 is continued on page 14 
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Question 20 continued 


(c) We would like to test if there are any significant differences between the three wines’ scores 


and we decided to continue our analysis using an ANOVA procedure. The output in R is: 


Df Sum Sq Mean Sq F value Pr(>F) 
Wine 2 0.094 0.04686 0.651 0.525 
Residuals 63 4.532 0.07193 


(i) State the null and the alternative hypothesis. [1] 


(ii) Check if the conditions for an ANOVA procedure are met. Interpret the R output 


and write an informative conclusion [2] 


Hint: In your answers consider any differences that you may observe in the IQRs. 


Formulae are given on page 15 


Please remember: This examination question paper MUST BE HANDED IN. Failure 


to do so may result in the cancellation of all marks for this examination. Writing your 


name and number on the front will help us confirm that your paper has been returned. 
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Formulae 
P(A and B 
P(A|B) = 2 PUB) P(A|B) = P(A) if A and B are independent 
P(A and B) = P(A)P(B) if A and B are independent 
xr I p 
Z = Z — 
a a//n 
pos se(Z) = = 
= n 
= d— Ld 3, 8d 
t= Sh se( ) = Vn 
MSE 
Zt x se(Z) se(Z) = aa se(Z;) = = 
2 2 
— _%1— 22 lads ST, 83 
Weg a) se(Z1 — 2) = = + s 
(Z1 — Z) + t* x se(% — Fo) se(Z1 — £2) yeas 
= oa = @ _ — — — 
1 — XQ se(Z1 — Zo 1— £2 eae 
ga (ua VNsi + (m2 — 1)89 
- ny tng —2 
LO ae 
DZ p(L =P) margin of error = z* p(L =P) 
n n 
m : - m b1(1 — p bo(1 — p 
(p1 — po) + 2* x se(py — po) se(p1 — po) =: 2 Pi) pa( p2) 
Ny ng 
O; — E;)? 
vs ys E; ) Ei, = np; 
gay ay) ass 
by 
t= 
se(b;) 
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